home *** CD-ROM | disk | FTP | other *** search
- Path: keats.ugrad.cs.ubc.ca!not-for-mail
- From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku)
- Newsgroups: comp.lang.c
- Subject: Re: unique id for a string
- Date: 26 Feb 1996 06:46:09 -0800
- Organization: Computer Science, University of B.C., Vancouver, B.C., Canada
- Message-ID: <4gsh3hINNf3b@keats.ugrad.cs.ubc.ca>
- References: <1996Feb16.175601.114182@kuhub.cc.ukans.edu> <3129dd39.961626@news.iquest.net> <4gfpveINNe68@keats.ugrad.cs.ubc.ca> <4gih8a$b62@madeline.INS.CWRU.Edu>
- NNTP-Posting-Host: keats.ugrad.cs.ubc.ca
-
- In article <4gih8a$b62@madeline.INS.CWRU.Edu>,
- Michael A. Balfour <mab22@po.CWRU.Edu> wrote:
- >
- >In a previous article, c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) says:
- >
- >>In article <3129dd39.961626@news.iquest.net>,
- >>Robert B. Clark <rclark@iquest.net> wrote:
- >> >On 16 Feb 96 17:56:01 CST, anh@kuhub.cc.ukans.edu wrote:
- >> >
- >> >>HELLO = H * 26^4 + E * 26 ^3 + L * 26^2 + L * 26^1 + E * 26^0
- >> >>
- >> >>But the id is way too big. I need something that fits within a 32-bits int
- >> >>or a 64-bits long.
- >> >
- >> >You're thinking way too hard, perhaps. Look up the algorithm for CRCs
- >> >(cyclic redundancy checks/codes) in Snippets or most any other popular
- >> >programmer's reference.
- >>
- >>But he wants a _unique_ ID. This is not possible, since the space of possible
- >>words is much larger than the space of 32- or even 64-bit integers. Even for a
- >>limited dictionary of around 200,000 words, a function that _guarantees_ a
- >>unique hash for each indentifier into a 32-bit space is difficult to find.
- >
- >Actually, it's not *that* difficult to find. Try running gperf from GNU
- >or reading the paper cited in my previous post in this subject. The
- >results in the paper claim to be able to find a perfect hash function
- >for 262,144 18-character words in 16.087 seconds on a Sun Sparc 2.
- >Doesn't sound that difficult to me. :)
- >
- >>One way to get a unique mapping is to simply assign increasing numbers to the
- >>200,000+ words, and use a hashing technique to quickly look up a word given its
- >>integer tag, and look up the tag given the word. With the proper
- >>represenatation, both operations can be done in "constant time".
- >
- >Once again, using a perfect minimal hash function will provide an easy
- >method for accomplishing this.
-
- Definitely. But that assumes that the dictionary is not open to new members.
- Otherwise you have to recompute the hashing function each time you add a
- member. Does the paper discuss such ``loose ends''?
-
- I am probably going to have a look at it, since perfect hashing functions are
- interesting things. :)
- --
-
-